********************************************************************************
** 	TITLE:		esN2019_polls		                                          ** 	
**	AUTHOR:	    Philippe Mongrain                                             **
**	DATE:		October 2022 					                              **	
**  VERSION:	Stata 16					                                  **	
********************************************************************************

* Version control

version 16.0

* Import data

import excel surveys, firstrow sheet(esN2019) clear
destring _all, replace

* Generate survey date

format %tdMon_DD,_CCYY polldate

* Generate election date

gen edate = 20191110

gen electiondate = date(string(edate,"%8.0f"),"YMD")

format %tdMon_DD,_CCYY electiondate

* Time of survey

gen time = (electiondate - polldate)

* Generate mean vote intention value by day

bysort time : egen psoevote = mean(psoe)
bysort time : egen ppvote = mean(pp)
bysort time : egen csvote = mean(cs)
bysort time : egen upvote = mean(up)
bysort time : egen voxvote = mean(vox)
bysort time : egen ercvote = mean(erc)
bysort time : egen jxcatvote = mean(jxcat)
bysort time : egen pnvvote = mean(pnv)
bysort time : egen ehbilduvote = mean(ehbildu)
bysort time : egen compromisvote = mean(compromis)
bysort time : egen ccncavote = mean(ccnca)
bysort time : egen navote = mean(na)
bysort time : egen prcvote = mean(prc)
bysort time : egen maspaisvote = mean(maspais)
bysort time : egen cupvote = mean(cup)

sort time

* Drop duplicates

duplicates tag polldate, gen(dup)
duplicates drop polldate, force
drop dup

* Reshape the dataset

rename psoe v_psoe
rename pp v_pp
rename cs v_cs
rename up v_up
rename vox v_vox
rename erc v_erc
rename jxcat v_jxcat
rename pnv v_pnv
rename ehbildu v_ehbildu
rename compromis v_compromis
rename ccnca v_ccnca
rename na v_na
rename prc v_prc
rename maspais v_maspais
rename cup v_cup

reshape long v_, i(polldate) j(party) string

rename v_ vote

keep polldate electiondate party vote poll time

order poll polldate electiondate time party vote

* Generate rank of parties

gsort polldate -vote

bysort polldate : gen rank = _n

gen first = party if rank == 1
gen second = party if rank == 2
gen third = party if rank == 3

bysort polldate : gen winner = first[1]
bysort polldate : gen runnerup = second[2]
bysort polldate : gen thirdplace = third[3]

* Generate poll margin

bysort polldate : gen pollmar = vote[1] - vote[2]

* Drop duplicates

duplicates tag polldate, gen(dup)
duplicates drop polldate, force
drop dup

* Misleading poll

gen misleading = 0 if winner == "psoe" & pollmar >= 1
replace misleading = 1 if misleading!=0

* Save

drop if time == 0 | time == .

keep polldate pollmar misleading time

save "esN2019_polls.dta", replace